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1. INTRODUCTION 

Irony has indeed been demonstrated to be ubiquitous in social media, offering major challenge to 
sentiment analysis field [1]. It is a cognitive phenomenon in which affect-related features play a significant 
role. In data driven world data is increasing exponentially day by day [2]. Machine learning and deep 
learning algorithms are playing a vital role in massive data analysis and knowledge extraction. The broad use 
of creative and metaphorical expressions like irony and sarcasm is common in user-generated content on 
social media sites like Twitter and Facebook [3]. Irony is the use of language that traditionally means the 
contrary to express one's meaning, usually for amusing or emphatic effect. Despite their significant 
distinctions in connotation, the phrases sarcasm and irony are frequently interchanged. The precision of irony 
identification is crucial in marketing research. Because irony usually causes polarity inversion, failing to 
acknowledge it may result in poor sentiment classification findings [4]. Intelligence services must be able to 
identify irony in order to separate perceived risks from ironic statements. Irony identification is indeed a 
complex problem especially relative to most natural language processing (NLP) tasks. Irony manifests itself 
in the form of polarized feeling, which is common on Twitter. For example “I really love this year’s summer; 
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weeks and weeks of awful weather”. In this example, irony results from a polarity inversion between two 
evaluations, the literal evaluation “I really love this year’s summer” is positive, while the intended one, 
which is implied by the context “weeks and weeks of awful weather’, is negative. 


2. RELATED WORK 

Transformers have the ability to learn longer-term dependence, but in the context of language 
modelling, they are constrained by a fixed-length context. Transformer-XL (extra long), which extends the 
length of learning dependency without interfering with sequential coherence [5]. Bidirectional encoder 
representations from transformers (BERT) is intended to train deep bidirectional representations from 
unlabeled text by reinforcing on both left and right context simultaneously in all levels [6]. Dai and Le 
presents two ways for improving sequence learning using recurrent networks that employ unlabeled text 
input. The first method is to anticipate what will happen next in a series. The second method is to utilize a 
sequence auto encoder, to scans the supplied sequence and to predicts it again [7]. Howard and Ruder 
proposes an efficient transfer learning approach that may be used to in any NLP tasks [8]. Provided a deep 
bidirectional language model that has been pertained on huge text corpora and will be put directly on top of 
the current model, considerably improving performance in subsequent NLP tasks [9]. Suggestion data mining 
is an emerging and demanding topic of NLP that aims to follow user recommendations on web forums [10]. 
Proposed 8x8 encoder and decoder layer with an attention mechanism that aids in parallel processing and 
reduces training time. The attention mechanism aids in paying special attention to each word and its position 
in the sentence [11]. Xie et al. offer an n-dimensional linkage approach for incorporating aspect relationships 
into deep neural networks for aspect value estimation [12]. 


3. PROPOSED METHODOLOGY 

The Proposed bidirectional long-short term memory-DistilIBERT (BiLSTM-DistilIBERT) framework 
consist a stack of layers like sentence embedding, transformer, BiLSTM [13], concatenate, pooling and 
finally softmax classification layers [14], [15], as shown in Figure 1. The sentence is represent as 
S = { $1,S2,S3 «Sp, } is embedded into the pre-trained DistiIBERT transformer layer followed by BiLSTM 
recurrent neural network [16]. Pooling mechanism is used to the representation of concatenated tensor value 
of DistiLBERT and BiLSTM outputs and finally routed through a fully connected softmax-layer. 
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Figure 1. The Proposed BiLSTM-DistilBERT framework 


The preceding insight supports our proposed model intuition, as per our observations the pertained 
deep neural networks play a vital role in NLP downstream tasks. Finally we proposed an end to end model 
that employs transfer learning by selecting a pre-trained DistiLBERT uncased model for our base and adding 
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additional BiLSTM recurrent neural network to extend the model [17], [18]. This is effective because the 
pre-trained model’s weights contain information representing a high-level understanding of the English 
language, so we can build on that general knowledge by adding additional layers whose weights will come to 
represent task-specific understanding of what makes a tweet irony or non-irony [19], [20]. The Hugging Face 
Transformers library makes transfer learning very approachable, as our general workflow can be divided into 
four main stages, namely input embedding, defining a model architecture, training classification layer 
Weights, fine-tuning DistiIBERT and training all weights. Hugging Face application programming interface 
makes it extremely easy to convert words and sentences into sequence of tokens and these tokens are get 
converted into tensors by the text vectorization class, finally these tensors are fed into our model. Once we 
instantiate our tokenizer object, we can then go about encoding our training, validation, and test sets in 
batches using the tokenizer’s. batch_encode_plus() method. Important arguments set as part of training are 
max_length to controls the maximum number of words to tokenize in a given text. Padding or truncation to 
adjust input according to max_length. Attention mask help the model to decide on which tokens to pay more 
attention and what all need to ignore thus, including the attention mask as an input to our model helps us to 
increase the model performance. As pertained model is extended by BiLSTM recurrent neural network units 
are capable to capture the long range dependencies among the tokens through this proposed model can learn 
the semantics of each inputs with respect to the specific task. The output of LSTM units get concatenated and 
passed through a feedforward network with maximum kernel size followed by pooling layer and as output 
softmax layer uses the softmax function to squash the vector of arbitrary real-valued scores. 


4. RESULT AND DISCUSSION 

The Proposed model is used Keras [21], is an open-source software library that provides a Python 
interface for artificial neural networks. Keras acts as an interface for the Tensor Flow library. In binary 
classification challenge, tweets are classified as irony or not irony. For binary classification, we trained 
model with 30 epochs, Adam optimizer and sparse categorical cross entropy loss function [22]. 


4.1. DataSet 

SemEval-2018 task 3: irony detection in English tweets, shared task on irony detection [23]: given a 
tweet, automatic NLP systems should determine whether the tweet is ironic (task A) and which type of irony 
(if any) is expressed (task B). The ironic tweets were collected using irony-related hashtags (i.e. #irony, 
#sarcasm, #not) and were subsequently manually annotated to minimize the amount of noise in the corpus. 
For both tasks, a training corpus of 3,834 tweets was provided, as well as a test set containing 784 tweets. 
Table | represents the irony and not irony samples for binary classification task. Table 2 shows the dataset 
splitting ration for training, validation and testing for multi-class classification task. 


Table 1. Binary classification dataset splitting ration for training, validation and testing 


Training Validation Testing 
Not irony 1,545 369 455 
Irony 1,506 395 329 


Table 2. Multi-class classification dataset splitting ration for training, validation and testing 


Training Validation Testing 
Not irony 1,534 382 473 
Clash irony 1,088 295 164 
Situtional 263 53 85 
Others 168 54 62 


4.2. Experimental results 

SemEval 2018 irony dataset has only training and testing samples. So, we have divided training 
samples into training and validation set in ratio of 80:20 [24]. Table 2 shows dataset splitting ration for 
training, validation and testing phases. We have achieved maximum training accuracy of 98% and validation 
accuracy of 69%. On testing samples, we have achieved precision of 81% for not irony class and 66% for 
irony class, recall of 77% for not irony and 72% for irony and 79% F1 score for not irony and 69% irony 
class. Table 3 shows precision, recall and F1 score for testing samples for binary classification. Our model is 
performing better in classifying not irony tweets as compare to irony class. 
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Table 3. Precision, recall and F1 score for testing samples for binary classification 


Precision recall FI score 
Not irony 81 77 79 
Irony 66 72 69 


Figure 2 shows accuracy and loss for training and validation dataset during training, as shown in 
Figure 2 training loss is always high compare to validation loss. Figure 3 shows confusion matrix for binary 
classification. Total of 168 samples of not irony class are classified as irony class and total of 108 samples 
visa-versa. Figure 4 shows area under the curve (AUC) and receiver operating characteristics (ROC) curve 
for irony binary classification. AUC-ROC curve shows performance of classification model under various 
threshold settings. AUC of our model is 0.72. 
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Figure 2. Training and Validation loss and accuracy for 30 epochs in Binary classification 
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Figure 3. Confusion matrix for binary classification 
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Figure 4. AUC-ROC curve for irony binary classification 
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In multi-class classification challenge, tweets are classified as three irony categories namely, clash 
irony, situational irony, and others and irony. Since multi-class dataset is derived from binary classification 
challenge by categorizing irony tweets, multi-class dataset is not balanced. For multi-class classification, we 
trained model with 30 epochs, Adam optimizer and sparse categorical cross entropy loss function. 

Table 4, shows testing phase results multi-class classification, as shows in the Table 4, proposed 
model achieves the Fl score of 84% for not irony, 18% for clash irony and 12% for situational. Figure 5, 
shows accuracy and loss for training and validation dataset during training. We have achieved maximum 
training accuracy of 87% and validation accuracy of 66%. On testing samples, we have achieved precision of 
73% for not irony class, 57% for clash irony class, 80% for situational irony class, recall of 99% for not irony 
and 10% for clash irony, 6% for situational irony and 84% F1 score for not irony and 18% clash irony and 
12% for situational irony. Our model is performing better in classifying not irony tweets as compare to 
different irony classes. Other type of irony class tweets not properly classified. Figure 6 shows confusion 
matrix for multi class irony classification. Proposed model is classified total of 553 samples as not irony 
class, 111 samples as clash irony, 56 samples as situational irony and 36 samples are as otherirony. 


Table 4. Testing phase results for different phases for all four classes 
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Figure 5. Loss and accuracy for training and validation samples during training phase of multi class irony 
classification 
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Figure 6. Confusion matrix for multi class irony classification 


4.3. Evaluation metrics 
In this research work we used confusion matrix to evaluate the performance of the proposed model 
for fine-grained irony classification task on SemEval-2018 task 3. In (1) to (3) are used to compute hyper 
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parameters of proposed hybrid neural network model, which might impact classification performance [25]. 
F1-Score is a measure combining both precision and recall. It is generally described as the harmonic mean of 
the two. 


TruePositive 


Precision(P) = = re (1) 
TruePositive +FalsePositive 
TruePositive 
Recall(R) = = : (2) 
TruePositive+FalseNegative 
P*R 
F1 — Score = 2 * (3) 
P+R 


5. CONCLUSION AND FUTURE SCOPE 

In this research work we proposed a BiLSTM-DistilBERT hybrid neaural network model to address 
fine-grained irony classification task on SemEval-2018 task 3 dataset. Transformers are used to minimize the 
data preprocessing and feature extraction tasks. Through transfer learning approach our proposed BiLSTM- 
DistiIBERT model achieves state-of-the-art results over the SemEval-2018 task 3 dataset. Also in future, 
instead of DistiIBERT transformers other type of transformers could be used to extract features from tweets. 
Also classification models such as basic supervised machine learning algorithm support vector machine 
could be used. 
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